view post Post 2508 BTW, in case anyone wants to kick the tires, test their 日本語, I have our Shisa V2 405B model up and running temporarily: https://chat.shisa.ai/ See translation
8b-class-japanese-models shisa-ai/shisa-v2-qwen2.5-7b Text Generation • Updated Apr 16 • 90 • 5 shisa-ai/shisa-v2-llama3.1-8b Text Generation • Updated Apr 16 • 57 • 1 shisa-ai/shisa-v2-llama3.1-8b-preview Updated Apr 15 • 3 sbintuitions/sarashina2.2-3b-instruct-v0.1 Text Generation • Updated Mar 5 • 4.86k • 22
speed LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 257 PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 44 Accelerating LLM Inference with Staged Speculative Decoding Paper • 2308.04623 • Published Aug 8, 2023 • 25 LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Paper • 2208.07339 • Published Aug 15, 2022 • 5
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 257
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 44
Accelerating LLM Inference with Staged Speculative Decoding Paper • 2308.04623 • Published Aug 8, 2023 • 25
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Paper • 2208.07339 • Published Aug 15, 2022 • 5